In this tutorial, we'll first go over some basics of working with and visualizing social network data (in this R notebook).
Then we'll go through how to extract region-wise inter-subject fMRI response time series (dis)similarities from members of a social network we have characterized (in a second notebook).
Finally, we'll relate the distance between individuals in the network to region-wise time-series similarities (in a third notebook).
This first part of the tutorial will primarily use the R package igraph. If you prefer Python, there's also an igraph module for Python that you can use. Both are based on the same C library.
This notebook uses the R kernel, but the other 2 notebooks Python-based.
if(!require(igraph)) install.packages("igraph"); require(igraph)
if(!require(RColorBrewer)) install.packages("RColorBrewer"); require(RColorBrewer)
if(!require(jsonlite)) install.packages("jsonlite"); require(jsonlite)
Any network is comprised of nodes(/vertices) connected by ties(/edges).
Nodes in a network can be people, groups, concepts or other entities.
The connections between them (edges) can also reflect many things, including interactions between nodes (e.g., who gives advice to whom, co-authorships, romantic partners, emails, phone calls), affiliations (e.g., belonging to the same club, living or working in the same place), and roles (e.g., kinship ties, friendships).
Edges can be weighted if the edges can take on varying values (e.g., to indicate the strength of a relationships or frequency of contact) or unweighted if the connections are all either 0 and 1 (e.g., the mere presence or absence of reported friendships).
Edges can also be directed or undirected. Edges based on things like common affiliations or behaviors would necessarily be undirected (symmetrical). However, if we define our edges based on things like who gets advice from whom, or who sends emails to whom, they would be directed (e.g., a connection can exist from node A to node B, and not from node B to node A).
As a quick example, we could define a network based on who supervises whom in Luke's lab, and it might look something like this, with everyone connected to Luke through a directed edge:
edges <- c("Luke", "Eshin",
"Luke", "Andy",
"Luke", "Jin",
"Luke", "Emma",
"Luke", "Daisy")
g_who_supervises_whom <- graph(edges, directed = TRUE)
plot(g_who_supervises_whom,
vertex.label.family = "sans",
vertex.shape = "none")
Alternatively, we could define edges between these people based on a shared attribute -- e.g., who works in Moore Hall, in which case everyone would be connected to everyone via undirected ties:
g_works_in_moore <- graph.full(n = length(unique(edges)),
directed = FALSE,
loops = FALSE)
V(g_works_in_moore)$name <- unique(edges)
plot(g_works_in_moore, vertex.label.family = "sans", vertex.shape = "none")
We wouldn't normally enter our nodes and edges manually like in the first example above. We would usually import the data from another file, which would typically summarize the connections between nodes in the form of an adjacency matrix, like this:
as_adjacency_matrix(g_who_supervises_whom)
Or an edge list, like this, where there is a row corresponding to each edge, with the first column being the source and the second column being the target:
as_edgelist(g_who_supervises_whom)
In the case of the social network data we'll work with today, we'll import an edge list of the reported friendships among members of an academic program. Let's read in the edge list and take a peek at what it looks like.
# load in the edge list
edge_list <- read.csv("./data/socialnetworks/anon_edge_list.csv")
# see what it looks like
head(edge_list)
You can see there are 2 columns: a source and a target. In our data, in a given row, the "source" is the person who named the "target" as their friend.
Now let's convert our data from an edge list to an igraph object, and inspect it.
our_graph <- graph_from_data_frame(edge_list, directed = T)
our_graph
You can see there are 279 nodes and 7,705 edges connecting them. The 2 letters preceding those numbers signify that the graph is comprised of directed edges ('D') and that the nodes have names ('N').
It's simple to access the nodes/vertices (i.e., people) and edges (i.e., reported friendships) of our graph:
E(our_graph)
V(our_graph)
Before we continue, let's clean up the graph to remove self loops (people who nominated themselves as friends).
our_graph <- simplify(our_graph, remove.loops = TRUE)
Now let's visualize the network.
In our case, the default parameters that igraph uses won't look very nice:
plot(our_graph)
That ugly jumble of shapes and letters doesn't really convey anything meaningful, but we can begin to remedy that with a few tweaks.
As a start, let's hide the labels, specify sizes for the vertices, edges, and arrows that won't cause so much overlap, and make the edges a higher-contrast color.
V(our_graph)$size <- 3
E(our_graph)$arrow.size <- 0.2
E(our_graph)$width <- 0.5
E(our_graph)$color <- "black"
plot(our_graph, vertex.label = NA)
That's starting to look mildly better, but we still have something of a hairball on our hands...
Since this is such a densely inter-connected network, it may be clearer to only display reciprocally reported friendships (i.e., ones that the sources and targets both reported).
our_graph_mutual <- as.undirected(our_graph, mode="mutual")
# need to reset edge attributes after making the graph undirected
E(our_graph_mutual)$color <- "black"
E(our_graph_mutual)$width <- 0.5
# plot
plot(our_graph_mutual, vertex.label = NA)
That's starting to look a little clearer, but not great.
You might be wondering what's determining where each node appears in the graph.
We can use different layout algorithms that will return coordinates for each node in the graph.
By default, igraph uses a smart function called 'layout nicely' to try to choose an appropriate layout function for the supplied graph, but you can stipulate something else instead.
Force-directed algorithms (e.g., Kamada Kawai, Fruchterman Reingold) typically work well for relatively large graphs. The details of how these algorithms work is beyond the scope of this tutorial, but generally, they simulate a physical system where edges are springs that pull connected nodes closer to one another, and nodes act as electrically charged particles that repel one another when they get too close. The benefit of using these kinds of algorithms is that they return layouts where the nodes are relatively evenly spaced, with nodes that share many connections located relatively close to one another.
We can try one out on our graph to generate new node coordinates:
coords <- layout_(our_graph_mutual, with_kk())
head(coords)
As you can see, "coords" is just an N (number of nodes) by 2 matrix of x and y coordinates for each node in our graph.
Now we can plot using these coordinates:
plot(our_graph_mutual, vertex.label = NA, layout = coords)
Notice that we have one isolate (i.e. one node without any connections). Force-directed layouts tend to work best with connected graphs, so we can remove the isolate and replot:
# identify isolates in the undirected (mutual ties only) graph
iso <- names(V(our_graph_mutual)[degree(our_graph_mutual)==0])
# remove isolates from undirected graph (for plotting)
our_graph_mutual_no_iso <- delete.vertices(our_graph_mutual, iso)
coords <- layout_(our_graph_mutual_no_iso, with_kk())
plot(our_graph_mutual_no_iso, vertex.label = NA, layout = coords)
Keep in mind that the location of a node within a graph isn't necessarily meaningful. It can be random:
coords <- layout_(our_graph_mutual, randomly())
plot(our_graph_mutual, vertex.label = NA, layout = coords)
Or you could just use a predefined shape, like a circle:
coords <- layout_(our_graph_mutual, in_circle())
plot(our_graph_mutual, vertex.label = NA, layout = coords)
These last 2 aren't all that informative but remind us that the depicted location of nodes in a network depends critically on the layout algorithm used to construct it, so one must be careful when reading into node location in a graph.
In any case, now we have a somewhat clearer visual depiction of the people in our community and the reciprocally reported friendships between them than what we started with.
If we were interested in ascertaining properties of the network as a whole (e.g., its community structure), we could hone our visualization further by selectively deleting nodes or edges or highlighting communities, but we will return to that later.
Here, our main goal is to visually convey the steps of our study: We characterized the patterning of social ties among members of an academic cohort, and had a subset participate in an fMRI study.
To visually convey the latter point, let's indicate via vertex color the network members who completed our fMRI study. First we'll read in the IDs of the fMRI subjects:
fmri_subj <- fromJSON("./data/fmri/fmri_subjects.json")
Now we'll make a character vector with an element for each subject ID in our network (note: we'll ignore that single isolate for now and just focus on the giant connected component that includes most nodes). The value of a given element will be "tomato" if the corresponding person was a subject in the fMRI study and "grey" otherwise. Then, we'll apply this information as an attribute for all vertices in our graph:
mycols <- ifelse(V(our_graph_mutual_no_iso)$name %in% fmri_subj,
"tomato", "grey")
V(our_graph_mutual_no_iso)$color <- mycols
coords <- layout_(our_graph_mutual_no_iso, with_kk())
plot(our_graph_mutual_no_iso, vertex.label = NA, layout = coords)
The above diagram conveys aspects of our experimental design visually: That is, we characterized the friendships among members of a commnunity, then had a subset of them participate in a neuroimaging (the red nodes).
However, the visualization does not add much beyond that.
If we wanted a clearer visual indication of the structure of a network, we might in some cases want to delete certain edges (e.g., edges below a threshold in a weighted network). We may also want to visualize communities that exist in the network. There are lots of ways to try to detect communities within networks -- or in other words, parts of the network that have many connections within them, but few connections bridging between them.
It is common to try to find a partitioning scheme that will maximize modularity -- basically, maximizing the number of connections within a community and minimizing the number of connections going across communities. To illustrate how to do this, and to explain a bit more about the fundamentals analyzing social networks, we'll use a different network of fictional characters that may be familiar to many of you -- that between the characters in Game of Thrones.
The data we'll be using was described in the following paper: A. Beveridge and J. Shan, "Network of Thrones," Math Horizons Magazine , Vol. 23, No. 4 (2016), pp. 18-22.
In this dataset, characters are linked if their names appear within 15 words of each other in the book A Storm of Swords -- e.g., because they were in the same place, were talking to each other, were being talked about in the same context, or were hearing about one another).
The process of compiling this dataset is described here: https://networkofthrones.wordpress.com/from-book-to-network/
Let's load in the edge list and take a peak at it:
got_edge_list <- read.csv("./data/socialnetworks/stormofswords.csv")
head(got_edge_list)
And convert it to an igraph object:
got_graph <- graph_from_data_frame(got_edge_list, directed = F)
got_graph
We can see that the network has 107 nodes with 352 edges between them. The 3 letters preceding those numbers mean that the graph is undirected ('U'), that the nodes have names ('N'), and that the edges are weighted ('W').
Let's visualize the network and probe who is particularly well-connected, as well as what communities emerge, as these might be more meaningful for many of you than the relationships among our anonymous participants in the friendship network we used earlier.
# set up function to scale vertex size
# to some value with a set min and max
scalevals <- function(v, a, b) {v <- v-min(v)
v <- v/max(v)
v <- v * (b-a)
v+a }
# set min and max node sizes
min_size_node = 2
max_size_node = 5
# scale node size according to node strength (weighted degree centrality)
nodesize_deg = scalevals(degree(got_graph), min_size_node, max_size_node)
nodesize_strength = scalevals(strength(got_graph), min_size_node, max_size_node)
plot(got_graph,
vertex.size = nodesize_strength,
vertex.label = NA,
edge.color="#00000088",
edge.curved=.2)
The Game of Thrones graph consists of a single connected component, but is there a modular community structure within that component, and what does it look like?
# implement fast greedy modularity optimization algorithm
# to find community structure
communities <- cluster_fast_greedy(got_graph)
# assign community membership as vertex attribute
V(got_graph)$community <- communities$membership
commnunity_pal <- brewer.pal(max(V(got_graph)$community), "Dark2")
# plot, coloring nodes by community membership
plot(got_graph,
vertex.size = nodesize_strength,
vertex.color = commnunity_pal[V(got_graph)$community],
vertex.label = NA,
edge.color="#00000088",
edge.curved=.2)
You could visualize the communities by "marking" (outlining) them:
plot(communities,
got_graph,
vertex.size = nodesize_strength,
vertex.label = NA,
edge.color="#00000088",
edge.curved=.2)
As you can see, the graph has been partitioned into 7 communities.
You could try to see who is in which community by commenting out the "vertex.label = NA" argument above. They may be hard to read since there are so many labels, so we can instead just inspect the communities directly. For example:
cluster_fast_greedy(got_graph)[[4]]
If you're familiar with these characters at all, you can see that the communities are comprised of people whose storylines unfold together.
Finally, because more of you will be familiar with the relaltionships among these characters than among our anonymous subjects in the real-world social network described above, we'll use the Game of Thrones network to quickly illustrate a couple more concepts regarding social network measures, and more generally, about extracting information about particular nodes.
For example, we can also extract only edges that meet certain criteria -- e.g., we can see whose names are mentioned together more than 75 times:
E(got_graph)[weight>75]
If you're familiar with these characters, this probably seems reasonable to you.
We can also characterize things about each node.
For example, we can also extract a given node's neighbors (i.e., direct connections):
neighbors(got_graph, "Hodor")
...And we can check on a particular relationship. For example, have Daenerys and Tyrion met yet in this book?
got_graph["Daenerys", "Tyrion"]
It doesn't look like it.
We can quickly check out who has the largest and smallest number of links with others by looking at their degree centrality (i.e., the total number of their direct connections):
head(sort(degree(got_graph), decreasing = TRUE))
head(sort(degree(got_graph), decreasing = FALSE))
Or, since this is a weighted network, we could instead look at strength, or weighted degree centrality. This just sums the weights of edges adjacent to each node (so here, the total number of interactions):
head(sort(strength(got_graph), decreasing = TRUE))
head(sort(strength(got_graph), decreasing = FALSE))
You could of course extract many other centrality measures as well, such as betweenness centrality (i.e., how often a node is on the shortest path between 2 other nodes).
Let's see who has high betweenness centrality -- in other words, who bridges between disparate areas of the network:
head(sort(betweenness(got_graph, normalized = TRUE), decreasing = TRUE))
neighbors(got_graph, "Aegon")
neighbors(got_graph, "Ramsay")
head(sort(eigen_centrality(got_graph)$vector, decreasing=TRUE))
head(sort(eigen_centrality(got_graph)$vector, decreasing=FALSE))
Hopefully, the above exercises gave you a sense of how to input, visualize, and interpret social network data.
In the remaining 2 notebooks in this tutorial, we'll focus on the same real-world friendship network that you visualized above (and below, for convenience):
coords <- layout_(our_graph_mutual_no_iso, with_kk())
plot(our_graph_mutual_no_iso, vertex.label = NA, layout = coords)
Next, we'll extract neural response time series from each of the red nodes (in the next notebook).
Finally, we'll relate the neural and social network data (in the last notebook).